Rank | Count | Beginning |
---|---|---|
3335 | 1303 | די |
1272 | 609 | אין |
4699 | 550 | דער |
8285 | 446 | ער |
7953 | 311 | עס |
71 | 259 | א |
2995 | 257 | דאס |
755 | 186 | און |
5858 | 170 | ווען |
113 | 165 | אבער |
9486 | 132 | רבי |
515 | 130 | אויך |
390 | 120 | אויב |
7519 | 117 | נאך |
6207 | 109 | זיין |
6125 | 99 | זיי |
8906 | 85 | פון |
7356 | 83 | מען |
9461 | 81 | ר' |
9676 | 75 | רוב |
5659 | 70 | ווי |
5399 | 64 | היינט |
6407 | 63 | זייערע |
6068 | 61 | זי |
7224 | 58 | מיט |
6319 | 57 | זיינע |
9621 | 47 | רבנות |
2592 | 44 | ביז |
1947 | 43 | איר |
7021 | 42 | לויט |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV